Search CORE

21 research outputs found

Exploring processor parallelism: Estimation methods and optimization strategies,”

Author: Lech Jóźwiak
Member IEEE Henk Corporaal
Rosilde Corvino
Student Member IEEE Roel Jordans
Publication venue
Publication date: 01/01/2013
Field of study

Abstract-Automatic optimization of application-specific instruction-set processor (ASIP) architectures mostly focuses on the internal memory hierarchy design, or the extension of reduced instruction-set architectures with complex custom operations. This paper focuses on very long instruction word (VLIW) architectures and, more specifically, on automating the selection of an application specific VLIW issue-width. The issuewidth selection strongly influences all the important processor properties (e.g. processing speed, silicon area, and power consumption). Therefore, an accurate and efficient issue-width estimation and optimization are some of the most important aspects of VLIW ASIP design. In this paper, we first compare different methods for the estimation of required the issue-width, and subsequently introduce a new force-based parallelism estimation method which is capable of estimating the required issue-width with only 3% error on average. Furthermore, we present and compare two techniques for estimating the required issue-width of software pipelined loop kernels and show that a simple utilization-based measure provides an error margin of less than 1% on average

CiteSeerX

ASAM: Automatic architecture synthesis and application mapping

Author: Angiolini
Ascia
Balasa
Catthoor
Deepak Gangadharan
Densmore
Erkan Diken
Giuseppe Notarangelo
Giuseppe Tuveri
Glitia
Hoffmann
Jan Madsen
Jozwiak
Jozwiak
Jóźwiak
Jóźwiak
Karuri
Laura Micconi
Lech Jozwiak
Leupers
Luigi Raffo
Mahadevan
Menno Lindwer
Murali
Murray
Paolo Meloni
Paul Pop
Pimentel
Roel Jordans
Rosilde Corvino
Schliebusch
Sebastiano Pomata
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Exploration de l'espace des architectures pour des systèmes de traitement d'image, analyse faite sur des blocs fondamentaux de la rétine numérique

Author: Corvino Rosilde
Publication venue: HAL CCSD
Publication date: 14/10/2009
Field of study

In this dissertation we present a method able to run a Design Space Exploration oriented to the optimization of the data transfer and storage management. The corresponding developed tool has been used as a front-end of HLS in order to help the user to find an optimized memory micro- architecture. Our method is able to handle image processing applications with non- affine array refer- ences. It is able to apply a paving which, on one hand, is based on a run-time dependence analysis and, on the other hand, uses disjoint and equal-by- translation blocks to parti- tion the data and instruction sets. The non- affinity of the array references is taken into account by projecting the instruction paving on the data paving. This method leads to a memory micro-architecture that is, at the same time, adapted to the non- affinity of the array references of the application and has a cheap control on the data transfer because of the invariability of the size of transferred data blocks.Dans le cadre de la synthèse de haut niveau (SHN), qui permet d'extraire un modèle structural à partir d'un modèle algorithmique, nous proposons des solutions pour opti- miser l'accès et le transfert de données du matériel cible. Une méthodologie d'exploration de l'espace des architectures mémoire possibles a été mise au point. Cette méthodologie trouve un compromis entre la quantité de mémoire interne utilisée et les performances temporelles du matériel généré. Deux niveau d'optimisation existe : 1. Une optimisation architecturale, qui consiste à créer une hiérarchie mémoire, 2. Une optimisation algorithmique, qui consiste à partitionner la totalité des données manipulées pour stocker en interne seulement celles qui sont utiles dans l'immédiat. Pour chaque répartition possible, nous résolvons le problème de l'ordonnancement des calculs et de mapping des données. À la fin, nous choisissons la ou les solutions pareto. Nous proposons un outil, front-end de la SHN, qui est capable d'appliquer l'optimisation algorithmique du point 2 à un algorithme de traitement d'image spécifié par l'utilisateur. L'outil produit en sortie un modèle algorithmique optimisé pour la SHN, en customisant une architecture générique

Thèses en Ligne

Hal - Université Grenoble Alpes

Rapid and accurate energy estimation of vector processing in VLIW ASIPs

Author: Corvino Rosilde
Diken Erkan
Jozwiak Lech
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

Many modern applications in important application domains, as communication, image and video processing, multimedia, etc. involve much data-level parallelism (DLP). Therefore, adequate exploitation of DLP is highly relevant. This paper focuses on effective and efficient exploitation of DLP for the synthesis of vector VLIW ASIP processors. We propose analytical energy models in order to rapidly estimate the energy consumption of a nested loop executed on a VLIW ASIP with respect to different vector widths. The models perform a rapid and relatively accurate energy consumption estimation through combining the relevant information on the application and implementation technology. The analytical energy models are experimentally validated and the validation results are discussed

Design Space Exploration in Application-Specific Hardware Synthesis for Multiple Communicating Nested Loops

Author: Corvino Rosilde
Gamatié Abdoulaye
Geilen Marc
Jozwiak Lech
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

International audienceApplication specific MPSoCs are often used to implement high-performance data-intensive applications. MPSoC design requires a rapid and efficient exploration of the hardware architecture possibilities to adequately orchestrate the data distribution and architecture of parallel MPSoC computing resources. Behavioral specifications of data-intensive applications are usually given in the form of a loop-based sequential code, which requires parallelization and task scheduling for an efficient MPSoC implementation. Existing approaches in application specific hardware synthesis, use loop transformations to efficiently parallelize single nested loops and use Synchronous Data Flows to statically schedule and balance the data production and consumption of multiple communicating loops. This creates a separation between data and task parallelism analyses, which can reduce the possibilities for throughput optimization in high-performance data-intensive applications. This paper proposes a method for a concurrent exploration of data and task parallelism when using loop transformations to optimize data transfer and storage mechanisms for both single and multiple communicating nested loops. This method provides orchestrated application specific decisions on communication architecture, memory hierarchy and computing resource parallelism. It is computationally efficient and produces high-performance architectures

HAL - Lille 3

Crossref

Automatic generation of a Parallel Tile Processing Unit

Author: Corvino Rosilde
Guizzetti Roberto
Mancini Stéphane
Urard Pascal
Publication venue: HAL CCSD
Publication date: 24/11/2008
Field of study

International audienc

Hal - Université Grenoble Alpes

Optimisation of a memory subsystem for nonlinear references in a High Level Synthesis flow

Author: Corvino Rosilde
Guizzetti Roberto
Mancini Stéphane
Urard Pascal
Publication venue: HAL CCSD
Publication date: 12/11/2008
Field of study

International audienc

Hal - Université Grenoble Alpes

Exploration de l'espace des architectures mémoire pour des systèmes de traitement d'image avec références non affines aux données (application à des blocs fondamentaux d'un modèle de rétine numérique)

Author: CORVINO Rosilde
GUIZZETTI Roberto
HERAULT Jeanny
MANCINI Stéphane
Publication venue
Publication date: 01/01/2009
Field of study

Dans le cadre de la synthèse de haut niveau (SHN), qui permet d extraire un modèle structural à partir d un modèle algorithmique, nous proposons des solutions pour optimiser l accès et le transfert de données du matériel cible. Une méthodologie d exploration de l espace des architectures mémoire possibles a été mise au point. Cette méthodologie trouve un compromis entre la quantité de mémoire interne utilisée et les performances temporelles du matériel généré. Deux niveau d optimisation existe : 1)Une optimisation architecturale, qui consiste à créer une hiérarchie mémoire, 2)Une optimisation algorithmique, qui consiste à partitionner la totalité des données manipulées pour stocker en interne seulement celles qui sont utiles dans l immédiat. Pour chaque répartition possible, nous résolvonsle problème de l ordonnancement des calculs et de mapping des données. À la fin, nous choisissons la ou les solutions pareto. Nous proposons un outil, front-end de la SHN, qui est capable d appliquer l optimisation algorithmique du point 2) à un algorithme de traitement d image spécifié par l utilisateur. L outil produit en sortie un modèle algorithmique optimisé pour la SHN, en customisant une architecture générique.The aim of this Phd is to propose a methodology that improves the data transfer and management for applications having non-affine arra references.The target applications are iterative image processing algorithms which are non-recursive and have static dependences. These applications are weil described by a Joop based C-code and they can undergo a High Level Synthesis which in fer a RTL model from an input C-code. The input code of the HLS can be optimized, by the loop transformations, with respect to its data storage and management.ln fact, in the trame of polyhedral model, the loop transformations enhance data locality and allow the computation parallelism and the data prefetchin. These transformations require that the array references are affine.ln our model we proposes a method to apply data and operations partitionning for applications with non-affine array references. An exploration is run with different tiling of input and output data spaces. The output tiling is than projected onto the input tiling. The outpu tiles calculations are re-schedulied in order to minimize the internai memory or optimize the temporal performance of the produced system. A mapping between the input tiles and the internai buffers is computed and, at the end, the best solutions in the analyzed set are chosen.GRENOBLE1-BU Sciences (384212103) / SudocSudocFranceF

OpenGrey Repository

Design space exploration in application-specific hardware synthesis for multiple communicating nested loops

Author: Abdoulaye Gamatié
Marc Geilen
Rosilde Corvino
Publication venue
Publication date
Field of study

Abstract—Application specific MPSoCs are often used to implement high-performance data-intensive applications. MP-SoC design requires a rapid and efficient exploration of the hardware architecture possibilities to adequately orchestrate the data distribution and architecture of parallel MPSoC computing resources. Behavioral specifications of data-intensive applications are usually given in the form of a loop-based sequential code, which requires parallelization and task scheduling for an efficient MPSoC implementation. Existing approaches in application specific hardware synthesis, use loop transformations to efficiently parallelize single nested loops and use Synchronous Data Flows to statically schedule and balance the data production and consumption of multiple communicating loops. This creates a separation between data and task parallelism analyses, which can reduce the possibilities for throughput optimization in high-performance data-intensive applications. This paper proposes a method for a concurrent exploration of data and task parallelism when using loop transformations to optimize data transfer and storage mechanisms for both single and multiple communicating nested loops. This method provides orchestrated application specific decisions on communication architecture, memory hierarchy and computing resource parallelism. It is computationally efficient and produces high-performance architectures. I

CiteSeerX